testingiauh912fandomcom-20200214-history
Technology in Language Testing F.Heidary
In reviewing the literature on computers in language testing, there are four recurring sets of issues: (a) item banking, (b) computer-assisted language testing, © computer-adaptive language testing, and (d) the effectiveness of computers in language testing. Item Banking Item banking covers any procedures that are used to create, pilot, analyze, store, manage, and select test items so that multiple test forms can be created from subsets of the total "bank" of items. With a large item bank available, new forms of tests can be created whenever they are needed. Henning (1986) provides a description of how item banking was set up for the ESL Placement Examination at UCLA. While the underlying aims of item banking can be accomplished by using traditional item analysis procedures, a problem often occurs because of differences in abilities among the groups of people who are used in piloting the items, especially when they are compared to the population of students with whom the test is ultimately to be used. However, a relatively new branch of test analysis theory, called item response theory (IRT), eliminates the need to have exactly equivalent groups of students when piloting items because IRT analysis yields estimates of item difficulty and item discrimination that are "sample-free." IRT can also provide "item-free" estimates of students' abilities. A serious limitation of IRT is the large number of students that must be tested before it can responsibly be applied. Typically, IRT is only applicable for full item analysis (that is, for analysis of two or three parameters) when the numbers of students being tested are very large by the standards of most language programs, that is to say, in excess of one thousand. Smaller samples in the hundreds can be used only if the item difficulty parameter is studied. Minimal item banking can be done without computers by using file cards, and, of course, the traditional item analysis statistics can be done (using the sizes of groups typically found in language programs) with no more sophisticated equipment than a hand-held calculator. Naturally, a personal computer can make both item banking and item analysis procedures much easier and much faster. For example, standard database software can be used to do the item banking, (e.g., Microsoft Access, 1996; or Corel Paradox, 1996). For IRT analyses, more specialized software will be needed. Computer-Assisted Language Testing Tests that are administered at computer terminals, or on personal computers, are called computer-assisted tests. Receptive-response items including multiple-choice, true-false, and matching items-are fairly easy to adapt to the computer-assisted testing medium. Relatively cheap authoring software like Testamaster(1988) can be used to create such tests. Even productive-response item types-including fill-in and cloze-can be created using authoring software like Testmaster. Unfortunately, the more interesting types of language tasks (e.g., role plays, interviews, compositions, oral presentations) prove much more difficult to develop for computer-assisted testing. However, advancing technologies have many potential ramifications for computer-assisted language testing. Brown (1992a) outlined some of the technological advances that may have an impact on language teaching and testing: Consider the multi-media combinations that will be available in the very near future: CD-ROM players working with video-image projectors, and computers controlling the whole interactive process between students and machines for situations and language tailored to each student's specific needs....Consider the uses to which computer communications networks could be put. What about scanners and hand-writing recognition devices? Won't voice sensitive computers and talking computers be valuable tools in the language media services of the future? (p. 2) The new technologies such as the CD-ROM and interactive video discussed in Brown (1992a) do make it possible for students to interact with a computer. Hence, no technical reason remains why interactive testing like role plays, interviews, compositions, and presentations cannot be done in a computer-assisted mode. Naturally, the expense involved may impose some limits, and the scoring will probably continue to involve rater judgments (thus, further increasing the expense involved). But at least, the logistics of gathering the language samples can now be simplified by the use of computer-assisted testing procedures. Two consequences may evolve from the current advances in technology: (a) the sophistication of existing computer hardware and software tools will continue to grow, and (b) the cost of the technology will continue to drop (eventually to within reach of all language programs). Hence, the possibilities for developing productive-response computer-assisted language tests will definitely increase. But, why should we bother to create computer-assisted language tests at all? Aren't they really just a sophisticated version of the paper-and-pencil tests that they will probably be modeled on? Two primary benefits can be gained from computer-assisted language testing: 1. Computer-assisted language tests can be individually administered, even on a walk-in basis. Thus group-administered tests and all of the organizational constraints that they impose will no longer be necessary. 2. Traditional time limits are not necessary. Students can be given as much time as they need to finish a given test because no human proctor needs to wait around for them to finish the test. No doubt, cheating will arise, but such problems can be surmounted if a little thought and planning are used. Given the advantages of individual, time-independent language testing, computer-assisted testing will no doubt prove to be a positive development. Consider the benefits of a writing test administered in a computer laboratory as the final examination for an ESL writing course. Such a computer-assisted test would be especially suitable for students who had been required to do all of their writing assignments in the course on a PC. In such a course, it would make imminent sense to allow the students to do their final examination writing samples on a computer and turn in the diskette at the end of the testing period (or send the file by modem or network to the teacher). Under such circumstances, the testing period could be quite long to allow time for multiple revisions. Of course, logistical problems will crop up, but they can no doubt be overcome with careful planning. In fact, the literature indicates that computers can be an effective tool for teaching writing (Neu & Scarcella, 1991; Phinney, 1991). Computer-Adaptive Language Testing Computer-adaptive language tests are a subtype of computer-assisted language tests because they are administered at computer terminals or on personal computers. The computer-adaptive subtype of computer-assisted tests has three additional characteristics: (a) the test items are selected and fitted to the individual students involved, (b) the test is ended when the student's ability level is located, and, as a consequence, © computer-adaptive tests are usually relatively short in terms of the number of items involved and the time needed. As Madsen (1991) put it, "The computer-adaptive language test (CALT) is uniquely tailored to each individual. In addition, CALT is automatically terminated when the examinee's ability level has been determined....The result is a test that is more precise yet generally much shorter than conventional paper-and-pencil tests" (p. 237). A clear description of how to develop computer-adaptive language tests (CALTs) is provided in Tung (1986). CALT development relies very much on item response theory. While the computer-adaptive language test is taking place, the computer typically uses a combination of item response theory and the concept of flexi-level tests (Lord, 1980) to create a test specifically designed for the individual student taking it. The flexi-level procedures roughly determine the general ability level of the student within the first few test questions. Then, based on item response statistics, the computer selects items which are suitable for the student's particular level and administers those items in order to get a more finely tuned estimate of the student's ability level. This flexi-level strategy eliminates the need (usually present in traditional fixed-length paper-and-pencil tests) for students to answer numerous questions that are too difficult or too easy for them. In fact, in a CALT, all students take tests that are suitable to their own particular ability levels-tests that may be very different for each student. Madsen (1991) describes an example of a CALT which was applied to students at Brigham Young University in Utah for testing reading and listening abilities. The Madsen (1991) study indicates that many fewer items are necessary in administering computer-adaptive language tests than are necessary in pencil-and-paper tests and that the testing time is correspondingly shorter. For example, the CALT in Madsen (1991) used an average of 22.8 items to adequately test the students in an average of 27.2 minutes. The comparable conventional reading test used in the study required 60 items and 40 minutes. Educational Testing Service (ETS) is providing considerable leadership in the area of what they are calling computer-based tests. That organization is already offering the GRE and PRAXIS as computer-based tests in 180 countries. In 1998, a computer-based version of the TOEFL examination will be released in North America and selected countries abroad, though paper-and-pencil versions will continue to be used until computer delivery is available. Effectiveness of Computers in Language Testing Educational Testing Service (1996) claims the following advantages for their new computer-based TOEFL: *#Further enhancements to test design before 2000 *#Greater flexibility in scheduling test administrations *#Greater standardization of test administration conditions *#Portions of test individualized to examinee ability level *#Inclusion of writing with every test administration *#Examinee choice of handwriting or typing essay *#Ability to record multiple aspects of examinee test-taking behavior *#Platform for future innovations in test design and services (p. 5). Judging by what they are claiming, at least portions of this new computer-based test will eventually be computer-adaptive. Brown (1992b) looked in more detail at both the advantages and disadvantages of using computers in language testing. ''Advantages.'' The advantages of using computers in language testing can be further subdivided into two categories: testing considerations and human considerations. Among the''' testing considerations, the following are some of the advantages of using computers in language testing: *#Computers are much more accurate at scoring selected-response tests than human beings are. *#Computers are more accurate at reporting scores. *#Computers can give immediate feedback in the form of a report of test scores, complete with a printout of basic testing statistics. *#IRT and computer-adaptive testing allow testers to target the specific ability levels of individual students and can therefore provide more precise estimates of those abilities *#The use of different tests for each student should minimize any practice effects, studying for the test, and cheating *#Diagnostic feedback can be provided very quickly to each student on those items answered incorrectly if that is the purpose of the test. Such feedback can even be fairly descriptive if artificial intelligence is used Among the human considerations, the following are some advantages of using computers in language testing: *#The use of computers allows students to work at their own pace. *#CALTs generally take less time to finish than traditional paper-and-pencil tests and are therefore more efficient *#In CALTs, students should experience less frustration than on paper-and-pencil tests because they will be working on test items that are appropriate for their own ability level. *#Students may find that CALTs are less overwhelming (as compared to equivalent paper-and-pencil tests) because the questions are presented one at a time on the screen rather than in an intimidating test booklet with hundreds of test items. *#Many students like computers and even enjoy the testing process (Stevenson & Gross, 1991). ''Disadvantages'.'' The disadvantages of using computers in language testing can also be further subdivided into two categories: physical considerations and performance considerations. Among the ''physical considerations, the following are some of the disadvantages of using computers in language testing: *#Computer equipment may not always be available, or in working order. Reliable sources of electricity are not universally available. *#Screen capacity is another physical consideration. While most computers today have overcome the 80 characters by 25 lines restrictions of a few years ago, the amount of material that can be presented on a computer screen is still limited. Such screen size limitations could be a problem, for example, for a group of teachers who wanted to develop a reading test based on relatively long passages. *#In addition, the graphics capabilities of many computers (especially older ones) may be limited, and even those machines that do have graphics may be slow (especially the cheaper machines). Thus, tests involving even basic graphs or animation may not be feasible at the moment in many language teaching situations. Among the performance considerations, the following are some of the disadvantages of using computers in language testing: *#The presentation of a test on a computer may lead to different results from those that would be obtained if the same test were administered in a paper-and-pencil format (Henning, 1991). Some limited research indicates that there is little difference for math or verbal items presented on computer as compared with pencil-and-paper version (Green, 1988) or on a medical technology examination (Lunz & Bergstrom, 1994), but much more research needs to be done on various types of language tests and items. *#Differences in the degree to which students are familiar with using computers or typewriter keyboards may lead to discrepancies in their performances on computer-assisted or computer-adaptive tests (Hicks, 1989; Henning, 1991; Kirsch, Jamieson, Taylor, & Eignor, 1997) *#Computer anxiety (i.e., the potential debilitating effects of computer anxiety on test performance) is another potential disadvantage (Henning, 1991). References Brown, J. D. (1996). Testing in language programs '' Madsen, H. S., & Larson, J. W. (1986). ''Technology and language testing